{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#  Area Under the Curve\n",
    "\n",
    "You are going to use Python to calculate probabilities associated with a Gaussian distribution. In other words, you'll learn to calculate the area under the probability density function.\n",
    "\n",
    "# SciPy Library\n",
    "\n",
    "Python has a library called [SciPy](https://www.scipy.org/scipylib/index.html), which is used for running scientific and mathematical computations. You are specifically going to use a part of the library that implements Gaussian distributions.\n",
    "\n",
    "# Demo: Probability Density Function\n",
    "\n",
    "The code cell below calculates a Gaussian probability density function two ways. First, we're using the density function from the previous exercises. Then, we compare the results using the SciPy library's implementation.\n",
    "\n",
    "You'll see that the results are exactly the same."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from scipy.stats import norm\n",
    "import numpy as np\n",
    "\n",
    "# our solution to calculate the probability density function\n",
    "def gaussian_density(x, mu, sigma):\n",
    "    return (1/np.sqrt(2*np.pi*np.power(sigma, 2.))) * np.exp(-np.power(x - mu, 2.) / (2 * np.power(sigma, 2.)))\n",
    "\n",
    "print(\"Probability density function our solution: mu = 50, sigma = 10, x = 50\")\n",
    "print(gaussian_density(50, 50, 10))\n",
    "print(\"\\nProbability density function SciPy: mu = 50, sigma = 10, x = 50\")\n",
    "print(norm(loc = 50, scale = 10).pdf(50))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You might wonder why the Gaussian distribution is called \"norm\" in the SciPy library; it's because the Gaussian distribution is also called the normal distribution. \n",
    "\n",
    "Also, note that to initialize the distribution, the loc keyword is the mean and the scale keyword is the standard deviation.\n",
    "\n",
    "# Calculating Probability\n",
    "\n",
    "The area under the probability density function represents probability. Your job will be to write a function that calculates the probability between two x-values. For example, using the winter San Francisco temperature example, what is the probability that the temperature is between 30 degrees and 50 degrees? \n",
    "\n",
    "The SciPy library has a function that calculates the area under the curve for you. It's called cdf ([cumulative density function](https://en.wikipedia.org/wiki/Cumulative_distribution_function)). You can use the cdf SciPy method in a similar way to the pdf method. Run the code cell below to see an example."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "norm(loc = 50, scale = 10).cdf(50)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Why is the output 0.5? The cdf method gives you the area under the curve from x = -infinity through the input, which in this case was 50. The area under the curve is 0.5 meaning there is a 50% chance that the temperature is between -infinity and 50 degrees.\n",
    "\n",
    "Run the code cell below to see a visualization of the area under the curve from -infinity to 50."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline  \n",
    "\n",
    "### \n",
    "# The plot_fill function plots a probability density function and also\n",
    "# shades the area under the curve between x_prob_min and x_prob_max.\n",
    "# INPUTS:\n",
    "# x: x-axis values for plotting\n",
    "# x_prob_min: minimum x-value for shading the visualization\n",
    "# x_prob_max: maximum x-value for shading the visualization\n",
    "# y_lim: the highest y-value to show on the y-axis\n",
    "# title: visualization title\n",
    "#\n",
    "# OUTPUTS:\n",
    "# prints out a visualization\n",
    "### \n",
    "\n",
    "def plot_fill(x, x_prob_min, x_prob_max, y_lim, title):\n",
    "    # Calculate y values of the probability density function\n",
    "    # Note that the pdf method can accept an array of values from numpy linspace.\n",
    "    y = norm(loc = 50, scale = 10).pdf(x)\n",
    "    \n",
    "    # Calculate values for filling the area under the curve\n",
    "    x_fill = np.linspace(x_prob_min, x_prob_max, 1000)\n",
    "    y_fill = norm(loc = 50, scale = 10).pdf(x_fill)\n",
    "    \n",
    "    # Plot the results\n",
    "    plt.plot(x, y)\n",
    "    plt.fill_between(x_fill, y_fill)\n",
    "    plt.title(title)\n",
    "    plt.ylim(0, y_lim)\n",
    "    plt.xticks(np.linspace(0, 100, 21))\n",
    "    plt.xlabel('Temperature (Fahrenheit)')\n",
    "    plt.ylabel('probability density function')\n",
    "    plt.show()\n",
    "\n",
    "average = 50\n",
    "stdev = 10\n",
    "y_lim = 0.05\n",
    "x = np.linspace(0, 100, 1000)\n",
    "\n",
    "plot_fill(x, 0, 50, y_lim,\n",
    "          'Gaussian Distribution, Average = ' + str(average) + ', Stdev ' + str(stdev))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# More Examples\n",
    "\n",
    "Here are a few more examples of the cdf method. The code cell below prints out the probability that the temperature is between:\n",
    "* -infinity and 25\n",
    "* -infinity and 75\n",
    "* -infinity and 125\n",
    "* -infinity and +infinity"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(norm(loc = 50, scale = 10).cdf(25))\n",
    "print(norm(loc = 50, scale = 10).cdf(75))\n",
    "print('%.20f' % norm(loc = 50, scale = 10).cdf(125)) # '%.20f' prints out 20 decimal places\n",
    "print(norm(loc = 50, scale = 10).cdf(float('inf')))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What if you wanted to know the probability of the temperature being between 25 and +infinity? Or 75 and +infinity? \n",
    "\n",
    "Well, you know that the area under the curve from -infinity to +infinity is equal to 1. And the cdf function can return the probability from -infinity to 25. \n",
    "\n",
    "So if you wanted to know the probability from 25 to +infinity, you could do the following calculation:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(1 - norm(loc = 50, scale = 10).cdf(25))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that for this particular Gaussian distribution, the probability that temperature is between -infinity and 75 is the same as the probability from 25 to +infinity. This is due to the symmetry of the Gaussian distribution: 25 and 75 are both 25 away from the mean of 50."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Exercise\n",
    "\n",
    "What if you wanted to know the probability that the temperature is between 25 and 75? Based on what you've seen so far, you have all the information you need to write a function that does this calculation. \n",
    "\n",
    "Run the code cell below to get a hint for how to do this. The code cell outputs visualizations for the area under the curve from -infinity to 25, then -infinity to 75, and finally between 25 and 75. \n",
    "\n",
    "Remember that the cdf method calculates area under the curve from -infinity up to the number you specify. How can you use this information to calculate the probability **between** 25 and 75? Remember that the probability between 25 and 75 is equal to the area under the curve between 25 and 75."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "average = 50\n",
    "stdev = 10\n",
    "y_lim = 0.05\n",
    "x = np.linspace(0, 100, 1000)\n",
    "\n",
    "plot_fill(x, 0, 25, y_lim,\n",
    "          'Gaussian Distribution, Average = ' + str(average) + ', Stdev ' + str(stdev))\n",
    "\n",
    "plot_fill(x, 0, 75, y_lim,\n",
    "          'Gaussian Distribution, Average = ' + str(average) + ', Stdev ' + str(stdev))\n",
    "\n",
    "plot_fill(x, 25, 75, y_lim,\n",
    "          'Gaussian Distribution, Average = ' + str(average) + ', Stdev ' + str(stdev))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Fill in your code below. The function has four inputs and one output. Here are the inputs:\n",
    "\n",
    "**Inputs**\n",
    "* mean - Gaussian distribution mean\n",
    "* stdev - Gaussian distribution standard deviation\n",
    "* x_low - low end of the probability range\n",
    "* x_high - high end of the probability range\n",
    "\n",
    "**Output**\n",
    "* probability that the x value is between x_low and x_high\n",
    "\n",
    "We've provided a solution in the next node of the lesson title \"Calculating Area Under the Curve in Python [Solution]\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def gaussian_probability(mean, stdev, x_low, x_high):\n",
    "    # TODO: return the Gaussian distribution probability\n",
    "    # that the x-value is between x_low and x_high\n",
    "    \n",
    "    # Use the SciPy library norm.cdf method\n",
    "    return 0"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Run the code cell below to test your results. A solution is provided in the next part of the lesson."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "assert float('{0:.8f}'.format(gaussian_probability(50, 10, 25, 75))) == 0.98758067\n",
    "assert float('{0:.2f}'.format(gaussian_probability(50, 10, float('-inf'), float('inf')))) == 1.0\n",
    "assert float('{0:.8f}'.format(gaussian_probability(50, 10, 20, 60))) == 0.83999485\n",
    "print('All tests passed')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
